Introduction to HPC
Scientific Programming with Python
Manuel Holtgrewe
Berlin Institute of Health at Charité
Session Overview
Aims
- Learn about important applications for Python in scientific programming …
- … in the biomedical/life sciences.
Considered Applications
- “Data
science wrangling” with polars
- Fast numerical computations with SciPy/Numpy
- Interfacing with R for Statistics
- Machine learning with scikit-learn
- Deep Learning with Tensorflow
Juptyer Notebooks
- Introduction
- Installation and Usage
Data Wrangling with Polars
- Introduction
- Loading and Writing Data
- “Tidy Data” with Polars
- Data Visualization
Introduction
- Polars features
- Polars vs pandas
Loading and Writing Data (1)
Loading Data
Loading and Writing Data (2)
Writing Data
Tidy Data with Polars (1)
What is tidy data?
R tidyverse
Tidy Data with Polars (2)
tidypolars
Data Visualization (1)
Plotly Express
Data Visualization (2)
Vega
Data Visualization (3)
Bokeh
Data Visualization (4)
It’s quite intimidating. How to get started.
Conversion Between Polars and Pandas (1)
Conversion Between Polars and Pandas (2)
- example code & explanation
Conversion Between Polars and Pandas (3)
- example code & explanation
Numerical Computation with SciPy/NumPy
- Introduction
- Input and Output
- Arrays and Shapes
- Vectorized Operations
Introduction to SciPy/NumPy
NumPy
- Numerical computing data structures
- high-performance vectors/matrices
- Numerical computing algorithms
- vectorized operations, random numbers …
- linear algebra
SciPy
- Fundamental scientific algorithms
- clustering
- interpolation / smoothing
- statistics
- sparse matrices
- image and signal processing
- FFT, integration
👉 Mostly low level characteristics, many other libraries build on top.
Arrays and Shapes
- vectors
- matrices
- data types
Vectorized Operations (1)
Example
Vectorized Operations (2)
Benchmark vs. Python List
Interfacing with R for Statistics
- Introduction
- Low-Level Approaches
- Using
rpy2
Low-Level Approaches
- via disk
- call R scripts from Python
Using rpy2 (1)
data transfer vs pandas and data frames
Using rpy2 (2)
Some examples:
- Student’s t-test
- Kolmogorov-Smirnov test
- Fisher test
Machine Learning scikit-learn
- Introduction
- Estimator Cheat Sheet
- Example: Clustering
- Example: Regression
- Example: Classification
Example: Classification
- Introduction
- Installation
- “TensorFlow 2 Quickstart for Beginners”
- Running on the HPC
“TensorFlow 2 Quickstart for Beginners”
https://www.tensorflow.org/tutorials/quickstart/beginner
Deep Learning with Tensorflow
Bring Your Own Project
🫵 Where can you apply what you have learned in your PhD project?
This is not the end…
… but all for this session
Recap
- Overview of Python in scientific programming
- Data wrangling with Polars
- Numerical computations with SciPy/NumPy
- Interfacing with R for Statistics
- Machine Learning with
scikit-learn
- Deep Learning with Tensorflow